deepEA User Manual

(version 1.0)

Identification of RNA Modifications

This module provides step-by-step functions required for epitranscriptome reads mapping and identification of RNA modifications.

Align Reads to Genome

Several commonly used aligners are wrapped to align epitranscriptome reads to genome. Currently, Tophat2, Bowtie2, STAR, HISAT2, bwa-mem.

Tools Description Input Output Time (test data) Reference
Tophat2 Tophat2 is a spliced aligner, which aligns short reads by calling Bowtie2 but alows for variable-length indels with respect to the reference genome. Epitranscriptome sequencing reads in FASTQ format and reference genome sequences in FASTA format Read alignments in SAM/BAM format ~50s Kim et al., 2013, Genome Biology
Bowtie2 Bowtie2 is a short read aligner which achieves a combination of high speed, sensitivity and accuracy by combining the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms, therefore bowtie2 is suitable for large genomes ~10 s Langmead et al., 2012, Nature Methods
STAR STAR is an ultrafast universal RNA-Seq aligner and can discover non-canonical splices and chimeric (fusion) transcripts ~16s Dobin et al., 2013, Bioinformatics
HISAT2 HISAT2 is an ultrafast spliced aligner with low memory requirements. It supports genomes of any size, including those larger than 4 billion bases ~8s Kim et al., 2015, Nature Methods
bwa-mem bwa-mem is a relatively early aligner based on backward search with Burrows–Wheeler Transform ~10s Li et al., 2009, Bioinformatics

 

Identify RNA Modifications

Identify RNA Modifications implements three pipelines for MeRIP-Seq, CeU-Seq and RNA-BSSeq, respectively.

ToolsDescriptionInputOutputTime (test data)Reference
Peak Calling from the MeRIP-Seq dataIdentify enriched genomic regions from MeRIP-Seq experimentRead alignments of IP and input in SAM/BAM format and reference genome sequences in FASTA formatRNA modifications in BED format~36sZhai et al., 2018, Bioinformatics
Calling m5C from the RNA-BSseq dataPerform bisulfite sequencing (BS-Seq) read mapping, comprehensive methylation calling using meRanTKSequencing reads in FASTQ format and reference genome sequences in FASTA formatm5C sites in BED format~10 mins using 2 threadsRieder et al., 2016, Bioinformatics
Calling Ψ from CeU-Seq dataIdentify pseudouridylation from CeU-SeqRead alignments in SAM/BAM format and cDNA sequences in FASTA formatPseudoridylation sites in BED format~1 minsLi et al., 2015, Nature Chemical Biology

Align reads to genome

Currently, deepEA wrapped five aligners to map epitranscriptome reads to genome, here, we take Tophat2 as an example to show how to use deepEA to run reads mapping, the other four aligners are similar.

Input

Output

How to use this function

Peak calling from the MeRIP-Seq data

Peak calling is used to identify enriched genomic regions in MeRIP-seq or ChIP-seq experiments. The function is implemented using the peakCalling function in PEA package (zhai et al., 2018)

Input

Output

How to use this function

Calling m5C from the RNA-BSseq data

This function integrated meRanTK (Rieder et al., 2016, Bioinformatics) to perform RNA bisulfite sequencing (BS-Seq) read mapping, comprehensive methylation calling.

Input

Output

How to use this function

Calling Ψ from CeU-Seq data

This function is used to identify pseudouridylation from CeU-Seq (Li et al., 2015). To be specific, for any given position on a reference transcript, the stop rate of position i was calculated using the equation N_i_stop/(N_i_stop + N_i_readthrough), where N_i_stop (stop reads) is the number of reads with the mapping position starting at base i+1 (one nucleotide 3′ to position i), and N_i_readthrough (readthrough reads) is the number of reads reading through position i; Then a position i is identified to be Ψ only when all of the following criteria were met:

Input

Output

How to use this function